Skip to content

Conversation

cynthiajoan
Copy link
Collaborator

Description

Replace this paragraph with a description of what this PR is doing. If you're modifying existing behavior, describe the existing behavior, how this PR is changing it, and what motivated the change.

Related Issues

Replace this paragraph with a list of issues related to this PR from the issue database. Indicate, which of these issues are resolved or fixed by this PR. Note that you'll have to prefix the issue numbers with flutter/flutter#.

Checklist

Before you create this PR confirm that it meets all requirements listed below by checking the relevant checkboxes ([x]).
This will ensure a smooth and quick review process. Updating the pubspec.yaml and changelogs is not required.

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • My PR includes unit or integration tests for all changed/updated/fixed behaviors (See Contributor Guide).
  • All existing and new tests are passing.
  • I updated/added relevant documentation (doc comments with ///).
  • The analyzer (melos run analyze) does not report any problems on my PR.
  • I read and followed the Flutter Style Guide.
  • I signed the CLA.
  • I am willing to follow-up on review comments in a timely manner.

Breaking Change

Does your PR require plugin users to manually update their apps to accommodate your change?

  • Yes, this is a breaking change.
  • No, this is not a breaking change.

@cynthiajoan
Copy link
Collaborator Author

/gemini summarize

Copy link

This pull request introduces bidirectional transcription capabilities for Firebase AI. Key changes include:

  • New Configuration: Adds AudioTranscriptionConfig and integrates inputAudioTranscription and outputAudioTranscription into LiveGenerationConfig to enable transcription for both input and output audio streams.
  • Transcription Data Model: Introduces a Transcription class to represent transcription text and its completion status.
  • Live Server Content: Extends LiveServerContent to include inputTranscription and outputTranscription fields, allowing the live server to send transcription updates.
  • Example App Updates: The example application (bidi_page.dart) has been updated to display these new transcription messages in real-time, including logic to append new transcription segments to existing messages. The text field in MessageData was made mutable to facilitate these updates.

This feature enhances the live generation experience by providing real-time text representations of both user input and model output audio.

@cynthiajoan cynthiajoan changed the title feat(firebaseai): bidi transcript feat(firebaseai): add bidi transcript Oct 16, 2025
@cynthiajoan
Copy link
Collaborator Author

/gemini review

@cynthiajoan cynthiajoan marked this pull request as ready for review October 16, 2025 22:13
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for bidirectional transcription in Firebase AI. The changes are well-structured, introducing new configurations and handling transcription messages. I've identified a few areas for improvement, including an unused variable, opportunities to reduce code duplication, and a suggestion to maintain immutability in a data class. My detailed comments and code suggestions aim to enhance code quality and maintainability.

Comment on lines +362 to +397
if (message.inputTranscription?.text != null) {
final transcription = message.inputTranscription!;
if (_inputTranscriptionMessageIndex != null) {
// TODO(cynthia): find a better way to update the message
_messages[_inputTranscriptionMessageIndex!].text =
'${_messages[_inputTranscriptionMessageIndex!].text}${transcription.text!}';
} else {
_messages.add(MessageData(
text: 'Input transcription: ${transcription.text!}',
fromUser: true));
_inputTranscriptionMessageIndex = _messages.length - 1;
}
if (transcription.finished ?? false) {
_inputTranscriptionMessageIndex = null;
}
setState(_scrollDown);
}
if (message.outputTranscription?.text != null) {
final transcription = message.outputTranscription!;
if (_outputTranscriptionMessageIndex != null) {
_messages[_outputTranscriptionMessageIndex!].text =
'${_messages[_outputTranscriptionMessageIndex!].text}${transcription.text!}';
} else {
_messages.add(
MessageData(
text: 'Output transcription: ${transcription.text!}',
fromUser: false,
),
);
_outputTranscriptionMessageIndex = _messages.length - 1;
}
if (transcription.finished ?? false) {
_outputTranscriptionMessageIndex = null;
}
setState(_scrollDown);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for handling inputTranscription and outputTranscription is nearly identical. This duplication can be reduced by extracting the common logic into a private helper method. This would make the _handleLiveServerMessage method shorter, cleaner, and easier to maintain.

});
final Uint8List? imageBytes;
final String? text;
String? text;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Changing text to be a mutable field makes the MessageData class mutable. In Flutter, it's a best practice to use immutable data classes to prevent unexpected side effects and make state management more predictable.

I recommend reverting this to a final field and adding a copyWith method to MessageData. This allows you to create a new instance with updated data, which aligns better with declarative UI principles.

Here's an example of how you could implement copyWith:

class MessageData {
  // ... existing final properties

  MessageData copyWith({
    String? text,
    // ... other properties
  }) {
    return MessageData(
      text: text ?? this.text,
      // ... other properties
    );
  }
}
Suggested change
String? text;
final String? text;

Comment on lines +394 to +418
Transcription? inputTranscription;
if (serverContentJson.containsKey('inputTranscription')) {
final transcriptionJson =
serverContentJson['inputTranscription'] as Map<String, dynamic>;
inputTranscription = Transcription(
text: transcriptionJson['text'] as String?,
finished: transcriptionJson['finished'] as bool?,
);
}
Transcription? outputTranscription;
if (serverContentJson.containsKey('outputTranscription')) {
final transcriptionJson =
serverContentJson['outputTranscription'] as Map<String, dynamic>;
outputTranscription = Transcription(
text: transcriptionJson['text'] as String?,
finished: transcriptionJson['finished'] as bool?,
);
}
return LiveServerContent(
modelTurn: modelTurn,
turnComplete: turnComplete,
interrupted: interrupted,
inputTranscription: inputTranscription,
outputTranscription: outputTranscription,
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The logic for parsing inputTranscription and outputTranscription is duplicated. To improve code quality and reduce redundancy, you can extract this logic into a local helper function within _parseServerMessage.

Suggested change
Transcription? inputTranscription;
if (serverContentJson.containsKey('inputTranscription')) {
final transcriptionJson =
serverContentJson['inputTranscription'] as Map<String, dynamic>;
inputTranscription = Transcription(
text: transcriptionJson['text'] as String?,
finished: transcriptionJson['finished'] as bool?,
);
}
Transcription? outputTranscription;
if (serverContentJson.containsKey('outputTranscription')) {
final transcriptionJson =
serverContentJson['outputTranscription'] as Map<String, dynamic>;
outputTranscription = Transcription(
text: transcriptionJson['text'] as String?,
finished: transcriptionJson['finished'] as bool?,
);
}
return LiveServerContent(
modelTurn: modelTurn,
turnComplete: turnComplete,
interrupted: interrupted,
inputTranscription: inputTranscription,
outputTranscription: outputTranscription,
);
Transcription? parseTranscription(String key) {
if (serverContentJson.containsKey(key)) {
final transcriptionJson =
serverContentJson[key] as Map<String, dynamic>;
return Transcription(
text: transcriptionJson['text'] as String?,
finished: transcriptionJson['finished'] as bool?,
);
}
return null;
}
return LiveServerContent(
modelTurn: modelTurn,
turnComplete: turnComplete,
interrupted: serverContentJson['interrupted'] as bool?,
inputTranscription: parseTranscription('inputTranscription'),
outputTranscription: parseTranscription('outputTranscription'),
);

Comment on lines +109 to +116
if (_liveGenerationConfig != null &&
_liveGenerationConfig.inputAudioTranscription != null)
'input_audio_transcription':
_liveGenerationConfig.inputAudioTranscription!.toJson(),
if (_liveGenerationConfig != null &&
_liveGenerationConfig.outputAudioTranscription != null)
'output_audio_transcription':
_liveGenerationConfig.outputAudioTranscription!.toJson(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The check for _liveGenerationConfig != null is repeated. You can make this code cleaner and more readable by using a local variable and a collection-if with a spread operator.

        final liveConfig = _liveGenerationConfig;
        if (liveConfig != null) ...{
          if (liveConfig.inputAudioTranscription != null)
            'input_audio_transcription':
                liveConfig.inputAudioTranscription!.toJson(),
          if (liveConfig.outputAudioTranscription != null)
            'output_audio_transcription':
                liveConfig.outputAudioTranscription!.toJson(),
        },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant